Policy-Guided Heuristic Search with Guarantees

نویسندگان

چکیده

The use of a policy and heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo its successors, which are based on the PUCT algorithm. While also used to solve single-agent deterministic it lacks guarantees effort computationally inefficient practice. Combining A* algorithm with learned tends work better these domains, but variants do not policy. Moreover, purpose using is find solutions minimum cost, while we seek instead minimize loss (e.g., number steps). LevinTS guided provides steps that relate quality policy, does make function. In this introduce Policy-guided Heuristic Search (PHS), novel uses both has theoretical relates We show empirically sliding-tile puzzle, Sokoban, puzzle from commercial game `The Witness' PHS enables rapid learning compares favorably A*, Weighted Greedy Best-First Search, LevinTS, terms problems solved time all three domains tested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guided Policy Search

Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynam...

متن کامل

Guided Policy Search with Delayed Senor Measurements

Guided policy search [1] is a method for reinforcement learning that trains a general policy for accomplishing a given task by guiding the learning of the policy with multiple guiding distributions. Guided policy search relies on learning an underlying dynamical model of the environment and then, at each iteration of the algorithm, using that model to gradually improve the policy. This model, t...

متن کامل

Guided Policy Search as Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

Guided Policy Search via Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

Guided exploration in gradient based policy search with Gaussian processes

Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i14.17469